An efficient hybrid stock trend prediction system during COVID-19 pandemic based on stacked-LSTM and news sentiment analysis

The coronavirus is an irresistible virus that generally influences the respiratory framework. It has an effective impact on the global economy specifically, on the financial movement of stock markets. Recently, an accurate stock market prediction has been of great interest to investors. A sudden change in the stock movement due to COVID -19 appearance causes some problems for investors. From this point, we propose an efficient system that applies sentiment analysis of COVID-19 news and articles to extract the final impact of COVID-19 on the financial stock market. In this paper, we propose a stock market prediction system that extracts the stock movement with the COVID spread. It is important to predict the effect of these diseases on the economy to be ready for any disease change and protect our economy. In this paper, we apply sentimental analysis to stock news headlines to predict the daily future trend of stock in the COVID-19 period. Also, we use machine learning classifiers to predict the final impact of COVID-19 on some stocks such as TSLA, AMZ, and GOOG stock. For improving the performance and quality of future trend predictions, feature selection and spam tweet reduction are performed on the data sets. Finally, our proposed system is a hybrid system that applies text mining on social media data mining on the historical stock dataset to improve the whole prediction performance. The proposed system predicts stock movement for TSLA, AMZ, and GOOG with average prediction accuracy of 90%, 91.6%, and 92.3% respectively.

and stop words. In the fundamental analysis approach, natural languages should be used to preprocess the text data [1].
Financial time series analysis and prediction, especially stock price prediction is one of the effective fields of research and has attracted many scientists and experts. Several techniques have been developed to predict the nonlinearity of time series and improve the accuracy of stock trend prediction either using machine learning or deep learning. As the stock data is noisy and volatile by its nature, there is a need for an effective model working with historical data. There is a huge amount of news and articles that describe the effect of Covid-19 on the economy, particularly stock development [1,21]. Therefore, the use and employment of this news can be attractive in future trend prediction for stock movement. It is greatly important to develop a model that can predict the impact of new diseases on the economy such as Covid-19, as this can help in protecting the economy from loss. Also, this operation is more effective for investors, as it helps investors from protecting their money. But, for working with news and text data, it is a large problem with storage requirements. For this problem, it is necessary, to develop a model to work with news and text data without storage requirements. This problem is solved by our proposed system which works with real-time news headlines.
In conclusion, this paper presents a hybrid automatic system that is composed of different modules such as the historical analysis module, sentiment analysis module, mixed analysis module, and final analysis module to predict the market trend during the present COVID- 19 Pandemic. The contribution of this paper can be summarized as follows: & Presents the concepts of sentiment analysis and stock prediction. & Provide an efficient module for stock future trend prediction through news sentiment via real-time news and apply machine learning classifiers on stock news with an efficient feature selection algorithm to reduce the data dimensionality of the acquired data. & Provide effective modules for stock historical future trend prediction purposes by providing the following: -Historical data analysis by applying machine learning classifiers to stock historical data with an efficient feature selection and data normalization process. stock historical future trend prediction through historical data analysis plus news analysis and applying ML classifiers and an efficient feature selection algorithm to mixed data.
& Propose a proficient hybrid system for the prediction of stock trends and movement utilizing and exploiting LSTM with Sentiment techniques. That use receives the output of all the previous modules and applies DL classifiers such as Stacked-LSTM. & Conducting a systematic evaluation of the proposed system through different experiments over various datasets. From the results, the proposed system showed a good performance in a comparison with the existing methods.
The rest of this paper is prepared as follows: Section 2 presents a concise foundation on stock prediction in addition to sentiment analysis. Section 3 discusses the related research in the stock prediction domain. Section 4 introduces the proposed stock prediction model while Section 5 demonstrates the experimental environmental study. The results of our experiments are presented and discussed in Section 6 while this paper provides its conclusion in Section 7.
In this part, we introduce an overview of sentiment and stock analysis, and some machine and deep learning algorithms. The stock market allows stockholders from owning shares of public companies through trading either by exchange or over-the-counter markets. This market has given investors the chance of achieving profit and having a prosperous life through investing small initial amounts of money [19]. The ability to predict stock prices is an important issue regarding the academic area as well as a business [19]. The stock market is the area where stocks can be transferred, traded, and distributed. It has been considered an important channel for large companies to raise funds from stockholders. On the one hand, through the issuance of stocks, a large amount of capital flows into the stock market, which enhances the organic configuration of corporate capital by promoting capital Concentration [19]. But prediction process of the stock price and stock future trend is generally regarded as a difficult and complex process due to stock properties such as noise and volatility. However, the development mechanism of stock Trend prediction is quite complicated. As it requires mixing in use between the various factors and the special behavior of individual factors, including political, economic, and market factors as well as technology and investor behavior, which will all lead to changes in stock prices [19].
In the current stock markets, the opinion of customers and published news whether positive or negative is a remarkable indicator of the future value of that stock [19]. Sentiment analysis for these data is a critical process that affects the next movement of this stock. Therefore, this process must be done accurately and efficiently [19]. The accurate prediction of a stock's future price will improve stockholders' profit [19]. As, there are a huge amount of data generated by the financial sector like client data, logs from their financial products, and operations data. That can be integrated with external data, like social media data and data from websites to help in the decision-making process [19]. Sentiment analysis is a useful tool used for stock price prediction. By analyzing the historical data of the stock and/ or online data related to the stock to predict the future price of this stock. Also, it can be the best step for the recommendation system that can recommend the best time for selling and buying the stock [24]. By analyzing the financial news through the sentiment analysis process, the system can determine the direction of the movement of the stock. Sentiment analysis provides automatic extraction of views, and emotions from the opinionated contents. It is used to automatically extract views, attitudes, and emotions from opinionated content.

Stock analysis
The financial exchange is an energetic segment of a nation's economy. It is perhaps the biggest open door for the venture by organizations and financial specialists. An organization can increase cash by growing its business through Initial Public Offerings. It is a decent ideal opportunity for a speculator to buy new stocks and accomplish additional benefits from profits offered in the organization's reward program [1]. Therefore, it is important to extract the impact of important news such as COVID news on these stocks. This process can be done by applying sentiment analysis on social media news and articles to extract the news impact on these stocks.
Stockbrokers need to foresee patterns in financial exchange conduct to take the right choice about the process of selling, holding the stock they possess, or buying other stocks [12]. In the case of a positive trend, the price of a stock will be upgraded, so an investor should buy this stock. Else, in the case of a negative trend, as the stock price will be downgraded, so it is best for an investor to sell this stock. Otherwise, an investor should hold his stock if its future trend is neural. Along these lines, the forecast of future financial exchange patterns is significant for the dynamics of stockbrokers. In any case, securities exchanges are unstable, and it is difficult to anticipate, and outer elements, similar to web-based media and day-by-day money-related news, influence stock costs on the double emphatically or adversely. These variables must be scholarly for the exact financial exchange forecast [6]. Many machine learning models and neural networks are widely used in stock prediction processes such as RF and Stacked -LSTM which are used in our proposal.

Sentiment analysis
It is a well-known fact that news trends highly affect people's mentality [13,24]. In the last decade, sentiment analysis becomes an effective research area due to the accessibility of an immense measure of literary information via online media and news stages [27,31]. This printed information can be examined for finding the assessments of clients for various application zones. For assessment examination of this enormous volume of literary information, information mining and AI convey extraordinary significance. The process that uses machine learning and deep learning algorithms to determine what is positive and negative in the text contents is known as sentiment analysis. It is an operation of extracting meaningful information from unstructured text. Text mining is the process of extracting relevant words and relations between them to facilitate their classification. So, sentiment analysis can be used in text mining. The basic function of sentiment analysis is deriving the feeling from the body of the text [23]. Likewise, it can be used in the opinion mining process, as opinion mining is the extraction, analysis, and classification of multiple opinions from several online news sources [5,7,18,34].
The steps of the general sentiment analysis process [3] as shown in Fig. 1 are: a) Data Acquisition: It is the step of collecting text data to build a proper dataset suitable for later processing. There are different sources to collect news and articles such as Twitter, Finviz, Facebook, etc. [3,5,18]. b) Text Preprocessing: This stage is a significant and effective operation, where the unwanted and useless text is removed such as stop words (regular words with no semantics and which don't contain any pertinent data to the errand, for example, "a", "an"), stemming (grammatically comparative words, for example, plurals, verbal varieties that give similar importance), evacuation of URL, and so forth After this progression, every content componenta sentence on account of text rundownis considered as an N-dimensional vector [3,5,18]. Therefore, it is conceivable to utilize some measurements in this space to gauge the comparability between text components. c) Feature Selection and Extraction: Proper selection and extraction of features play a key role in moving forward the exactness of the model. Hence, a suitable feature extraction technique must be chosen for extracting the features [3]. d) Sentiment Classification: In this step, various sentiment classification techniques are applied to classify the text. Some popular sentiment classification techniques are Naïve Bayes (NB) and Support Vector Machines (SVM). It classifies the entire documents or text or review according to the opinions towards certain objects [3]. e) Polarity Detection: After classifying the sentiments, the polarity of the sentiment is determined. The goal of polarity detection is to provide the impact of the sentiment positive, negative, or neutral sentiment. If the polarity is positive, then it is a powerful indicator of the discussed object. Else, it is negative, so it is a frustrating indicator for this object. Else, it is no new effect on this object. f) Validation and Evaluation: Finally, validation and evaluation of the obtained results are performed to determine the overall accuracy of the techniques used for sentiment analysis. In this step, some evaluation metrics are used to determine the quality of the sentiment analysis process such as accuracy and error metrics.

Machine learning and stock analysis
Machine learning is widely used in prediction and recommendation processes such as linear regression, random forest, decision tree, K-nearest neighbors, and neural networks such as convolutional neural networks (CNN), and recurrent neural networks such as Long Short-Term Memory (LSTM). Stacked-LSTM, and Bidirectional LSTM as discussed in the survey

Sentiment Classification
Polarity Detection

Validation and Evaluation
Data Acquisition Fig. 1 Steps of Sentiment Analysis that is introduced by authors in [19]. Our System is based on using random forest and stacked LSTM in the analysis and processing process. Random Forest creates an ensemble of random trees. The number of trees is determined by the user. Each tree performs the prediction process for the same input. In the end, RF determines the output based on the voting process for all trees [19]. Random Forest is defined as a classifier consisting of a collection of tree-structured classifiers. The main idea of RF is that operating many relatively uncorrelated models (trees) will outperform any of the individual constituent models [19]. The output class is the same output of the tree that has the majority. The operation of RF is shown in Fig. 2. Also, there is several Neural networks model that can be used in the stock analysis process for the stock price and/ or stock trend prediction process such as Convolutional neural network (CNN), and Recurrent Neural Networks such as Long -Short Term Memory (LSTM), Stacked -LSTM and Bidirectional LSTM. In Our Model, Stacked -LSTM is used for processing and analysis besides RF. The operation of Stacked-LSTM is described in Fig. 3. The Stacked LSTM is an extension of LSTM that has multiple hidden LSTM layers where each layer contains multiple memory cells. As In LSTM, there is a single hidden layer, LSTM is appeared to solve CNN problem as in CNN, there is no feedback from output to input. To solve this problem within CNN, Overfitting and/or underfitting problems appeared. RNN solves the overfitting and/ or underfitting problems of CNN. But RNN faces a problem in storing the input states for a long period. So, there is a need for LSTM that can store input for a long using a special unit called memory/ state unit [15]. LSTM stores the previous states of input in its memory cell, so it can work efficiently with long term models and solves the long-term problem of RNN. The LSTM can delete or add information from or to the cell state, efficiently performed by structures called gates. LSTM gates are a method to pass information through. They consist of a sigmoid neural network layer and a pointwise multiplication operation. An LSTM is composed of three gates, to monitor and protect the cell state.
As Stacked-LSTM is an advanced module of LSTM with several hidden layers to enhance the training performance. This stacked-layers mechanism can improve the power of neural  In the last decade, sentiment analysis has been an effective research area for the accessibility of a colossal sum of literary information on social media and news stages. Also, many types of research are concentrated on the stock prediction process. In addition to some recent research that concentrates on the effect of Covid-19 on the economy especially stock. In this paper we introduce some of them as follows: & Sentiment Analysis: In 2021, In addition to [5], the authors presented a framework that uses sentiment analysis to extract the impact of covid19. In 2020, [10] the authors use machine learning techniques with social media news to extract how the data influences stock movement. Also, [9] the authors proposed a prediction model where the authors proposed a prediction model for stock price based on sentiment analysis of news. In 2019, [35] the authors proposed a model that extracts the relationship between online posts and stock movement. Finally, in [14] the authors proposed a model with a combination of specialized and crucial examination through the application of information science and machine learning methods for stock advertising forecast. & Stock Analysis: In 2020, [29] the authors proposed a model for stock cost expectation based on LSTM neural network and PCA feature selection algorithm. Also, in [25] the authors proposed a prediction method known as a cross-reference to exchange-based stock trend prediction that predicts the future trend of stock based on daily stock data using the LSTM neural network. A deep learning-based framework is proposed in [32], which predicts stock movement using a recurrent neural network (RNN) and graph convolutional network. In addition to [4] where the researchers proposed the Stacked LSTM and Multilayer perceptron (MLP) model to enhance the accuracy of stock prediction. Finally, the authors [16] applied wavelet transform to stock data to denoise it and improve stock prediction accuracy. The researchers [30] introduced a Stock Prediction model based on Stacked LSTM and Multilayer perceptron (MLP) to enhance the accuracy of stock prediction. The authors [26] proposed a stock prediction model based on LSTM and an attention technique then uses the wavelet transform to clear noise from historical stock data. In 2019, the authors [33] proposed an associated deep recurrent neural network mode based on RNN such as LSTM to enhance Prediction accuracy using many inputs and many outputs. Finally, in [15] The authors proposed an algorithm using the market data to predict the share price using a recurrent neural network such as Long Short-Term Memory, in that process weights are corrected for each data point using stochastic gradient descent. & COVID-19 and stock analysis: In 2020, [17] the authors presented a statistical study of Covid-19 impact using sentiment analysis of the Twitter message. Also, in [22], the authors proposed a model for stock price prediction using Covid-19 & stock index analysis. In addition to [2], the authors presented a study based on forecasting Covid-19 cases and Spain's stock. Using the Suttearima method.
It's perceived that several authors introduced much research on stock analysis, sentiment analysis, and Covid-19 impact. Most of the techniques develop stock price analysis models using either ML or DL techniques and some researchers develop prediction models based on sentiment analysis of social news.
Finally, with the appearance of Covid-19, some researchers discuss the impact of Covid-19 on the world, especially on economic movement. But, in this paper, we propose a hybrid system that uses ML and DL models together to enhance prediction accuracy. In addition to using historical data analysis with news sentiment analysis to predict more accurate stock future trends with Covid-19 spread.
In addition to working with real-time news that solves the memory requirements of the sentiment analysis process. Finally, we propose an efficient hybrid system using ML and DL techniques for applying data analysis and real-time news sentiment analysis to predict the future trend of stock during the Covid-19 effect.

Proposed system
This section describes our proposed system; the basic aim of the proposed system is to study the impact of COVID-19 on the economy, especially the stock market. In the last months, there is an appearance of a new disease called Covid-19 that has had a sudden effect overall world. As some stocks face a sudden downgrade in their price due to Covid-19 appearance, while others have an increased price such as medical stocks. During the period of the Covid-19 spread, people are directed to buy more Medicines than normal, which leads to an increase in medical stock prices. Our system can be applied in the case of the appearance of any new disease that threatens the economy of any country efficiently.
In the proposed system, we try to predict the stock's future trend based on the analysis of social media text, financial news, and historical data. The system aims to achieve better prediction accuracy than the others. In this system, we try to employ social media text and money-related news to predict the influence of COVID-19 on some stocks such as TSLA, AMZ, and GOOG. Our system is composed of several sub-processes as shown in Fig. 5. These sub-processes can be described as follows: The detailed description of these steps can be as follows:

a) Future trend prediction based on Covid-19 news headlines Sentiment analysis
This task is the subprocess from the overall proposed system in which sentiment analysis of Covid-19 articles is applied. In this sub-process, we collect some news headlines during covid-19 and analyze them to extract the impact of covid-19 on the stock movement using news headlines.
This sub-process is composed of some steps such as: 1. Data Collection: it is a collection of Covid-19 news with stock from different sources such as Twitter, Market watch, Facebook [18], Google, etc. In our proposed, news in our proposed, news headlines are collected from Finviz manage the news headlines with the date in a table that facilitates the data request for us in our work. In our proposed system, news headlines are requested in real time for the running process. As the text data requires more storage memory, our proposed system solves this problem by requesting the news headlines in real time of the experiment running. 2. Sentiment Analysis: sentiment analysis is applied to this news to extract the sentiments of Covid-19 on the stock. During sentiment, some processes such as Tokenization, parsing, and assignment of sentiment analysis score processes are applied. a) Tokenization is the procedure of breaking up a specified text into small parts called tokens. Tokens can be individual words, phrases, or even whole sentences. In the process, some useless characters like punctuation may be discarded. These tokens are very useful for finding some patterns as well as are considered a base step for stemming and lemmatization. In our proposed system, this step is important for finding patterns that relate to Covid-19 with stock. The tokens usually become the input for the processes like parsing and text mining. b) Parsing is a common programming task that separates the given series of text into smaller components based on some rules. Its application ranges from document parsing to deep learning NLP. In the COVID-19 news headlines sentiment subprocess, we use regular expressions (Regex) for parsing processes that allow us from comparing article words with words such as Covid-19, and stock as important words, and determine other words such as stop words for rejection.  c) Sentiment Score Assignment: In our proposed system, the Sentiment score is a measure of how Covid-19 affects the stock movement. In this step, the polarity of the article text is determined (positive-negative-neural). The output of this step is some values under some parameters as positive(pos.), negative (neg.), neural (neu,), and compound (comp.) that can be used to determine the direction of stock movement. In our proposal, we use a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number such as described in Eq. 1.
The sentiment polarity will be positive if its compound attribute of sentiment score is greater than the threshold like 0.2, else if the compound is lesser than the negative threshold like −0.2, the sentiment polarity will be negative, else it will be neural for zero compound value [10]. Then the trend and future trend of stock movement are determined based on the stock score. As if this score equals 1, then the trend of this stock is positive, and the close price must be at an increasing rate. Else, if this score equals 0, then this stock will hold at its current state (no change). Else, if this score equals −1, then the stock movement will downgrade, and its prices will decrease as described in Eq. 2.

Sentiment Polarity
Where compound is the compound polarity score for each news headline: for neural tweets=words −1 for negative tweets=words Sentiment label : is the assigned label for each news headlines Sentiment Polarity : impact of this text on stock future movement 3. Future Trend Prediction using ML Classifier: at the end of sentiment analysis, its dataset is assigned to the ML classifier to predict the future trend of stock based on its current trend and analysis of the sentiment dataset. In this step, the feature selection algorithm is applied to the sentiment dataset to extract the most important features from the dataset and assign them to an ML classifier. For feature selection, there are four ways as Univariate selection, Recursive Feature Elimination (RFE), Principal Component Analysis (PCA), and choosing important features (feature importance). In our proposed system, we apply the importance of the most important features (SelectKbest) way that produces the best prediction accuracy for our dataset and the others. In our proposed system, we assign a compound variable as an important feature that can be used in stock future trend prediction. In this phase, we use a random forest classifier (RF). Finally, the prediction vector of this subprocess is passed to a global ML classifier that performs the final prediction process and predicts the final Covid-19 impact on the stock movement.

b) Stock Historical Data Analysis
In this sub-process, the analysis of historical data is applied to predict the future trend of stock influenced by Covid-19. In this sub-process, stock future movement prediction is based on the analysis of stock data in the period of covid-19 to extract the impact of this pandemic It is composed of some steps as follows: 1. Data Collection: historical dataset in the period of Covid-19 spread (1/12/2019 to 30/8/ 2020) is collected and downloaded from the yahoo finance website. This dataset contains the date, Open, Low, High, Adj Close, Close, and Volume attributes that describe the stock dataset. From these attributes, we extract the Current _trend and future trends. We apply some operations on stock data to be suitable with the dates of news publishing by using only stock data with the same dates of news headlines and rejecting the others(delete).
The esteem of the Current _trend highlight can be found by subtracting the stock open cost from the near cost on a certain date. The criteria for selecting these values are given within the taking after condition 3 [5].
where Close : is the close price for this day Open : is the open start ð Þprice for this day The value of the Future _trend can be found by subtraction of the stock's current day closing price from the closing price after n days. If the result of subtraction is greater than zero, so there is an upgrade in stock movement and the future _trend will be in the upward pointer. Else, if the result of subtraction equals zero, it means there is no change in the stock state and it is a hold indicator, so its future _trend will be neural. Else, if the result of subtraction is lesser than zero, this is a downgrade in the stock movement and negative future _ trend value as shown in Eq. 4. This attribute is critical and sensitive as it is the target of the prediction process.
Where NCP : is the close price after n days: CCP : is the current=today close price: n: number of some days while the stock exchange process occurred.
2. Data Preprocessing: in this step data normalization is performed to enhance the prediction accuracy. As the Close attribute values are in the range of 336.00 and the Volume attributes are in the range of 336,000 in the stock dataset. The normalization process normalizes all values in the range between 0 and 1. Then feature selection is applied to select important attributes for the data analysis process to reduce the data dimensionality.
In this step, we use feature importance for the feature selection process. Finally, the prediction vector of this subprocess is passed to the global ML classifier.

c) Analysis of mixed Database of Sentiment Analysis and historical data analysis
In this sub-process, we create a stock dataset that involves the previous sub-process dataset (historical data + current _trend + future _trend) and the current _trend attribute from news sentiment. In this process, we try to feed the ML classifier with more data that enhance the prediction accuracy. In this sub-process, we try to enhance accuracy by merging the advantages of three sub-processes. In this step, the ML classifier is fed by a composite dataset and analyzes these data to predict the future trend of stock movement. By selecting some features from our previous dataset (stock data+ sentiment dataset) such as the close price from historical data and compound variable from the sentiment dataset ML classifier predict the future trend. Then the developed dataset is assigned to an ML classifier to predict the future trend. Finally, the future _ trend predictions are passed to the global ML classifier.

d) Prediction of Future trend of stock based on Covid-19 impact
This is the Stacked-LSTM classifier that has its input from the output of the previous sub-processes. This step predicts the future _trend of stock by processing the prediction vectors of the previous sub-processes. In this step we develop a dataset from Date, Close, article predictions, news predictions, historical _ predictions, mixed _predictions, and Future _ trend attributes, where: & News predictions: It is the prediction vector produced from the ML classifier of sentiment analysis of COVID-19 news headlines with stock. & Historical _ predictions: It is the prediction vector produced from the ML classifier of stock historical data analysis. & Mixed _predictions: It is the prediction vector produced from the ML classifier of the mixed historical plus sentiment analysis dataset. Then we apply some preprocessing steps such as data normalization to the newly created dataset to scale our data. The used Stacked-LSTM is composed of three LSTM neural Network layers, the first two layers with 32 units, and the RELU activation function. The last layer is the dense layer that receives the output of the second layer and provides the final prediction class. LSTM is used in many analytical processes and science branches such as [11,28]. The explanation of LSTM basics and operations are described carefully in [19]. Also, the Stacked-LSTM is used in many branches as described in [19].
In our experiments, we use an epoch of 100 and batch_size of 10 where the internal loop of processing for Sacked-LSTM is 100 with 10 samples as data samples for processing. Finally, the output of this process is the final prediction vector of our proposed system that extracts and predicts the accurate future trend of stock related to the Covid-19 spread.

Experimental environment
Setting up the environment Python 3.7 was used as the programming environment. The following packages were installed for the run of our experiment: Tensorflow 1. 13

Data collection
In our experiments, we use historical stock datasets and news headlines, for TSLA stock in the period from 1/12/2019 to the year 2021. This data is described with some attributes such as Open, Low, high, Close, Adj-Close, and Volume as shown in Table 1. Then we request news headlines from finviz.com that collect stock news headlines as ordered in the table with the date and news headlines attribute. For collecting the text or news for TSLA stock, we select the ticker variable with TSLA in our programming. The statistical description of the used TSLA dataset is shown in Table 2.
For GOOG stocks, we download their historical data, from 1/12/2019 to the year 2021. This data is described with some attributes such as Open, Low, high, Close, Adj-Close, and Volume. Then, we request news headlines from finviz.com that collect stock news headlines as ordered in the table with the date and news headlines attribute. For collecting the text or news for GOOG stock, we select the ticker variable with GOOG in our programming. The statistical description of the used GOOG dataset is shown in Table 3.
Finally, for AMZN stock, we download historical and news headlines from 1/12/2019 to the year 2021. It is the same steps of the two previous stocks that are applied to collect the news headlines of AMZN stock. The statistical description of the used AMZN dataset is shown in Table 4. The collected news headlines are described in Fig. 6. But historical data is downloaded from yahoo finance.com and stored on our computer. The covid-19 wave graph that describes the spread of this pandemic in the experiment period is shown in Fig. 7 (https:// time.com/5893916/covid-19-coronavirus-third-wave/).

Data preprocessing
Where historical data is preprocessed to add some additional attributes such as Current _trend, and Future _trends with their calculated values. Then feature selection is applied to reduce data dimensionality. With COVID-19 and stock news headlines are preprocessed and then it is passed to the sentiment analysis module to assign the sentiment score to each headline. The output of the three ML classifier is passed to the data normalization step before assigning it to stacked LSTM. Finally, the output of the preprocessing step in each module is passed to the processing step that applies either the ML or DL model to predict the future trend of a stock.

Data processing
In our experiment, we use Random Forest Classifier in the three sub-processes and the Stacked-LSTM classifier in the final phase of our system to predict the future _trend of the stock. The processing steps are as follows:   & Test the Stacked-LSTM with the test dataset & Then the output of this phase is passed to be evaluated against the actual future _trend to evaluate our proposed system performance via some evaluation metrics such as accuracy, recall, precision, and f1_score.

Performance evaluation metrics
In this paper, our evaluation process involves measuring some metrics such as: & Accuracy: it is a measure to discriminate against the optimal solution during the classification training. It is one of the most common evaluation metrics used practically to evaluate the generalization ability of classifiers. Generally, the accuracy measures the ratio of correct predictions over the total number of evaluated samples as shown in Eq. 5. For our work, the output is multiclass, so the used accuracy measure is the average effectiveness of all classes as described in Eq. 6 [8]. & Recall: it is used to measure the fraction of positive patterns that are correctly classified.
The normal formula of recall is shown in Eq. 7. But, for our multiclass model, we use the averaged recall as shown in Eq. 8 [8]. & Precision: it is used to measure the positive patterns that are correctly predicted from the total predicted patterns in a positive class Error. The normal formula of precision is shown in Eq. 9. But, for our multiclass model, we use the averaged recall as shown in Eq. 10 [8]. & F1_Score: it is the harmonic mean between recall and precision values. The normal formula of precision is shown in Eq. 11. But, for our multiclass model, we use the averaged recall as shown in Eq. 12 [8]. & Error: it measures the system's weakness. It is the ratio of false predictions to whole predictions. It can be calculated as described in Eq. 13. & Confusion Matrix: it is a table layout that allows visualization of the performance of an algorithm, typically supervised learning. Each record of the matrix represents the cases in a predicted class while each column represents the instances in an actual class T pk þ T nk T pk þ T nk þ f pk þ f nk n ð6Þ T pk T pk þ T nk n ð8Þ T pk T pk þf pk n ð10Þ F1 score ¼ 2*precision*recall precision þ recall ð11Þ averaged f 1 score ¼ 2*precisionM *recallM precisionM þ recallM ð12Þ Where: T pk : is the true positive for class C k ð Þ T nk : is the true negative for class C k ð Þ  Table 5 shows the evaluation metrics for our proposed system for step-by-step processing for TSLA stock. The accuracy variation from the first phase until the final phase is shown in Fig. 8. The confusion matrix of the actual future trend and the predicted one is shown in Fig. 9. & Case 2: Table 6 shows the evaluation metrics for our proposed system for step-by-step processing for GOOG stock. The accuracy variation is shown in Fig. 10. The confusion matrix of the actual future trend and the predicted one is shown in Fig. 11. & Case 3: Table 7 shows the evaluation metrics for our proposed system for step-by-step processing for AMZN stock. Figure 12 shows the accuracy variation for this stock. The confusion matrix of the actual future trend and the predicted one is shown in Fig. 13. Table 7 shows the measured performance results for three discussed stocks with our proposed system. Where Fig. 14 shows the different accuracy for TSLA, GOOG, and AMZN stocks. Output results between actual values and predicted values for TSLA, GOOG, and TSLA are forecasted for the test phase as shown in Figs. 15, 16, and 17 consequently. For our proposed system, there are four phases to get the prediction vector (future trend) of stock. Covid-19 shakes the overall world, and it influences society. So, it is necessary to find an efficient system that predicts the future trend of the economy during the COVID effect. In this paper, we introduced an efficient system that predicts the stock's future trend based on the analysis of historical data and social media news.
In this paper, we experimented with our proposed system with three stocks TSLA, GOOG, and AMZN to ensure the efficiency of our system with data variation and its flexibility with data size and dimensionality variation. From the experimental results shown in Table 1, we find the TSLA predicted future trend is close to the actual future trend with an accuracy of 90%. This accuracy is reached as the prediction process is done in three different ways. First, the historical data analysis produces stock future trends with85.7% accuracy, in this phase we anal the size of the historical data during the Covid-19 existence and predicts the future trend. Second, the news headlines sentiment analysis produced the future trend with an accuracy of 57.1. In this phase, we predict the future trend by analyzing the social news that relates between TSLA and Covid-19. Third, the mixed analysis predicts the future trend by analyzing some historical data with news sentiments. This process predicts the future trend with an accuracy of 75%. Finally, the prediction vectors are passed to the Stacked-LSTM that learned this data deeply using three LSTM NN and RELU activation functions. This phase predicts the future trend from the previously predicted trends of three phases with an accuracy of 90%.
In the historical analysis, the preprocessing steps enhance the prediction accuracy such as normalization such as scaling the historical data to the same range of 0&1. Data normalization is necessary for a stock dataset, because of different ranges of data. These different ranges are presented in the Close, and Volume features of the dataset. For TSLA, the Close feature has a  By applying data normalization in our proposed system, we avoid the different range problems and avoid oscillating the data in a different area. Then feature selection preprocessing step is applied to select only more relevant features and reject the others. There are many ways to do this such as Chi-Square, PCA, and feature importance. In our proposed system, we apply the feature importance. This step improves the system performance as it reduces the overfitting chances and reduces the training time. By training with irrelevant features, there is additional training time without the need that may cause the system to be overfitted and more complex. By applying feature selection in our proposed system, we improve the performance.
Using an RF classifier for the three first sub-processes enhances the prediction accuracy, as it is practically proven it is the best than other ML classifiers. RF works as several associated decision trees, then selects the decision tree with the majority. For this, we select RF as an ML classifier to improve our prediction accuracy.
Using three sub-processes for the same goal enhances the prediction accuracy more than using only one of them. The first method is the prediction of a future trend using historical data that predicts the future trend based on the variation of the Close feature. The second method is predicting the future trend based on the sentiment analysis of news (stock & COVID-19). The third method is predicting the future trend during Covid-19 based on the selected features from the merged dataset of historical and sentiments.
Finally, the important step is passing the prediction vectors from the previous subprocess to Stacked-LSTM which learns the data deeply. Using Stacked-LSTM in this step improves prediction accuracy as it deeply learned the prediction vectors to produce a more efficient one.  So, the use of hybridization in our proposed system provides better accuracy than individual models. As hybridization allows our system to perform the prediction process accurately and reduce the errors of individual systems as shown by the result.
Tables 5, 6, 7 and 8 show the performance enhancement produced by using the Stacked -LSTM module for final prediction. From Table 4, we can see the difference between the first  Likewise, we improve performance not only by improving prediction accuracy and avoiding overfitting problems but also by solving the problem of memory storage for sentiment analysis. News sentiment analysis requires more memory for data storage for further analysis, but in our proposed system we apply sentiment analysis for real-time news that requests it for the running process without storage requirements. Finally, in this work, we proposed an automatic prediction system that: & Requires only the URL for news collection without storage requirements. & Requires only a historical stock dataset, as our system preprocesses this data to be suitable for analysis. & Preprocessing news enhances prediction accuracy by rejecting irrelevant words such as stop words, and punctuation.    In this part, we provide a comparison between the proposed system and existing ones for stock future trend prediction during Covid-19. The results are contained in Tables 5, 6, 7 and 8 of our preliminary study, listing the results of learning techniques, through the application of validation and testing for the three stocks, correspondingly, and the results gained from the experiments indicate that Test Accuracy, f1_score, recall, and precision of approach are more accurate and more efficient. Table 9 shows a comparison between our proposed approach and the recent approaches on the same scientific point.

Conclusion
Lately, stock price prediction and the stock future trend have been recognized as very effective and attractive research problems for many researchers around the world. Besides, the sudden appearance of Devastating viruses such as Covid-19 requires automatic analysis and prediction to protect the economy from loss. This paper provides a simple survey of the sentiment analysis process and stock analysis. Then, it introduces an efficient proposed system that is based on sentiment analysis of social news in addition to analysis of stock historical data. Our system is restricted by the stock data variation during the Covid-19 period. Moreover applying the DL on the first phase predictions using Stacked-LSTM. In historical data analysis, we apply some preprocessing steps such as data normalization and feature selection that enhance the prediction performance. Likewise, in sentiment analysis, we apply some text preprocessing steps such as tokenization and parsing processes. Then the Stacked-LSTM module has the function of analysis for prediction vectors and provides more accurate predictions of a future trend. The proposed system is checked and tested on three stocks to evaluate the performance. In summary, the proposed system predicts stock movement for TSLA, AMZ, and GOOG with average prediction accuracy of 90%, 91.6%, and 92.3% respectively.
Availability of data The data is available up to a request from the authors.
Funding Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.